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ror could vary with each subject or even with each observation so the er- 
rors are heteroscedastic. In this paper, we propose a fast algorithm using a 
simulation-extrapolation (SIMEX) method to recover the unknown density 
in the case of heteroscedastic contamination. We show the consistency of 
the estimator and obtain its asymptotic variance and then address the prac- 
tical selection of the smoothing parameter. We demonstrate that, through a 
finite sample simulation study, the proposed method performs better than 
the Fourier-type deconvolution method in terms of integrated squared error 
criterion. Finally, a real data application is conducted to illustrate the use 
of the method. 
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1. Introduction 

A fundamental problem in measurement error models is to recover an unknown 
density of a variable when its observed values or data arc contaminated with 
errors. The ever increasing interest in the problem comes from an increased 
number of medical and financial studies in which variables are observed with 
measurement errors or are only partial available. Formally, let X be the variable 
of interest, which we cannot observe directly. Instead, based on an observed 
sample Yi , • • • ,Yn drawn independently from 

Y ^X + U (1) 

where the measurement error U is independent of X. One is interested in esti- 
mating the unknown density function of X. The distribution of U is typically 
assumed known or can be estimated separately. 

Denote the density functions ofX,U and Y by fx, fu and fy, and their char- 
acteristic functions by (px , and ipy , respectively. If the measurement errors 
are ignored, then a naive estimator of fxi'x) is the ordinary kernel estimate, 

1 " ^ — Y \ 1 " 

j=l ^ ^ 3 = 1 

where Kh{x) ~ K{x/h)/h, K{-) is a symmetric probability kernel with a finite 
variance J x'^K{x) < oo. It is clear that fx.naiveix) is a biased estimator of 
fxix) with 

E(/x,naii,e(a;)) = fx * fu{x), 

where " * " denotes convolution. Hence, finding a consistent estimator of fx{x) 
requires deconvoluting the density of measurement errors fu{x). 

The usual deconvolution procedure is to apply a Fourier inversion on ipx (t) : 

i./.-.V.«)<i*^i;/.--gf<i<. (3) 

Of course, ipy in (3), if unknown, needs to be estimated. Indeed, substituting 
(/3y(t) in (3) by its kernel estimate 

^y(t) = ( e^'-fy{x)dx ^\Y.f e^'^-^-'^^^K (^^) e^'""' dx = 1 (p^(t;z)e 

J ^3 = 1-' ^ ^ " i=l 

leads to the Deconvoluting Kernel Estimator (DKE) of fx, first introduced by 
[21]: 



/x,.K.(x) = ^E^^^*(^)' (4) 

3 = 
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where 

is called the deconvoluting kernel. Observe that the deconvoluting kernel esti- 
mate in (4) is just an ordinary kernel estimate but with a specific kernel function 
in (5). We call any estimator of ,fx{x) that requires a Fourier inversion or trans- 
formation a Fourier-type estimator. 

There has been substantial literature on the Fourier-type deconvoluting ker- 
nel estimators. See, for instance, [21], [1], [15], [10], [9], [26], [5] and [25]. The 
difficulty with a deconvolution problem depends heavily on the smoothness of 
the error density fu. Fan [10] characterized two types of error distributions: ordi- 
nary smooth and super-smooth distributions. Examples of the ordinary smooth 
distribution include gamma, symmetric gamma and Laplacian distributions; ex- 
amples of the super-smooth distribution include normal, mixture normal and 
Cauchy distributions. The convergence rate of a DKE is very slow in the case of 
a super-smooth error distribution. For example, when U is normally distributed, 
the convergence rate is only at 0((log7i)^^/^) [27; 10]. 

Further, in many real applications, the distribution of measurement error 
could vary with each subject or even with each observation so the errors are 
heteroscedastic. Consideration of heteroscedastic errors can be traced back at 
least to Fuller [12] (chapter 3) in 1987, in a special case of linear regression, 
where predictor are measured with error. In a recent book, Carroll et al. [3] 
discussed systematically the state of art in measurement error models includ- 
ing the nonlinear regression with heteroscedastic error in variables. Dclaigle 
and Meister[8] proposed an adjusted Fourier deconvolution estimator for the 
density estimation with heteroscedastic errors. They also applied the adjusted 
method to the nonparametric regression problem [7]. Staudenmayer et al. [20] 
considered a different type of model where the observed data are the sample 
means of replicates contaminated with heteroscedastic errors. They presented 
a spline-based density estimation method using a Monte Carlo Markov chain 
and a random-walk Metropolis-Hastings algorithm. Sun et al. [24] proposed 
new non-Fourier estimators for density function when errors arc homoscedastic 
or heteroscedastic but uniformly distributed. The new estimators abandon the 
characteristic functions - there are no Fourier transformations needed in the 
calculation. 

In this paper, we provide a new density estimation procedure for data con- 
taminated with additive and heteroscedastic Gaussian measurement error (§2), 
without using a Fourier transform. Our resulting estimator is a "variable band- 
width type" kernel estimator, adopting the simulation-extrapolation (SIMEX) 
idea [22], though the simulation step in the original SIMEX algorithm is by- 
passed in our procedure. It is asymptotically unbiased and consistent (Proposi- 
tion 2.1). The practical selection of the smoothing parameter is addressed in §3. 
Our estimator is computationally faster than the Fourier-type KDE proposed 
by [8] . It also has a competitive and often smaller integrated squared error than 
the DKE does (§4). An application to an astronomy data set using the both our 
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method and the KDE method is given in §5. The article ends with concluding 
remarks in §6. The proof of Proposition 1 is in the Appendix. 

2. Estimation procedure and asymptotics 

SIMEX, first proposed by Cook and Stefanski [4], is a "jackknife"-type bias- 
adjusted method that has been widely applied in regression problems with mea- 
surement error. Cook and Stefanski [4; 23] applied the SIMEX algorithm to 
parametric regression problems. Carroll et al.[2]; Staudenmayer and Ruppert 
[19] discussed the nonparametric regression in the presence of measurement er- 
ror using SIMEX method. Stefanski and Bay [22] focused on SIMEX estimation 
of a finite population cumulative distribution function (rather than a density) 
when sample units are measured with error. For a more complete discussion of 
the subject of SIMEX see the monograph by [3]. 

The key idea underlying SIMEX is the fact that the effect of measurement 
error on an estimator can be determined experimentally via simulation. SIMEX 
methods in both parametric and nonparametric regression consist of two steps: 
a simulation step and an extrapolation step. In the simulation step, additional 
independent measurement errors with known variance (typically denoted by 
Acr^, where A is a parameter to control the amount of added measurement 
errors) are generated and added to the original data, thereby creating "pseudo" 
data sets with successively larger measurement error variances. So, the total 
measurement error variance is then (1 -I- As)cr^ for the sth pseudo data set. 
The "pseudo estimators" are obtained from each of the generated "pseudo" 
data sets. The above simulation and estimation are repeated a large number of 
times, and the average value of the estimators for each level of contamination 
is calculated. Then, in the extrapolation step, a regression technique, such as, 
nonlinear least squares, is used to fit the trend between the pseudo estimators 
and the controlling parameter A of the added errors. At last, extrapolation to the 
ideal case of no measurement error {i.e. A = — 1) yields the SIMEX estimator. 

To investigate a SIMEX algorithm on density estimation for data contam- 
inated with errors, we consider a general heteroscedastic measurement error 
model. We generalize model (1) to 

Y, = X, + U, (6) 

where j = I,-- - ,7i, Xj ^ fx and Uj ^ fu^. Since the normal distribution 
is frequently used in applications, we will focus on the typical super-smooth 
heteroscedastic error case: Uj ~ A^(0, (t|), for j = 1, ■ ■ ■ ,n. Hence, fuj is com- 
pletely specified if aj is. Clearly, if cti = • • • = (t„ = cr, errors are homos cedastic; 
otherwise, errors are heteroscedastic. Note that the homoscedastic model is just 
a special case of the heteroscedastic model. 

Under the model setting (6), we now consider estimating the unknown density 
function fx{t) at a given t. By the general simulation extrapolation algorithm, 
estimators are re-computed on a large number m of measurement error-inflated 
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pseudo data sets, {l^r(A)}"^i, where 

Yjr{X) = Yj + X^/^Ujr, j = !,■■■ ,n, r = 1, • • • , m, 

Ujr are independent, pseudo-random variables with density fu^, and A ^ is a 
constant controUing the amount of added error. 

The conventional kernel estimator of the density function from the rth variance- 
inflated data is 

By the SIMEX algorithm, the simulation and estimation steps are repeated 
a large number of times, and the average value of the estimators for each level 
of contamination is calculated. 

= - E 9rit) = - E - E I'^>^(' ^'^r) (7) 
r=l r=l \ j=l j 

Note that by the law of large number, 

m ^ E(.g(t)) = - ^ E - Y, \'/%r)\Y, 



n 

as m ^ oo. Denote 0(-) the density function of standard normal distribution 
and let V = —{t — Yj — ajX^/-^w)/h, we have as h 0, 

E (^Kh{t -Yj - ajX'^^Wjr)\Y,^ 

Kh{t - Yj - ajX^/'^w)(j)iw)dw 

1 f T^i \M ft-Yj+vh\ 



1 



K{v) 



t-Yj\ vh ( t-Yj\ ( t-Yj 



1 ft-Y, 



fo(/i) 



o{h) 



dv 



Therefore, the simulation step can be bypassed in our SIMEX algorithm for 
density estimation. The above derivation of (8) is similar to for the cumulative 
distribution function estimation by [22]. We use g* in (8) to replace g in (7) in 
our estimation. 
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Wc then calculate the quantity in (8) for a prc-detcrmincd sequence of A, i.e. < 
Al < A2 < • • • < As . The success of SIMEX technique depends on the assumption 
that the expectation of g* is well-approximated by a nonlinear function of A. 
Here we consider a standard quadratic function of A, 

E(r(t,A)) =/3o + /3iA + /32A2. 

Our SIMEX estimator for the unknown density / without measurement error 
then can be obtained by the extrapolation step, i.e. letting A ^ — 1 



fx,siMEx{t) = Vmi^ E{g*{t, A)) = /3o - A + /32 ■ 



Next, we can rewrite the SIMEX estimator in the form of 

1 " 

fx,siMEx{t) = -^Gj(t|yj,(Tj,A), 



(9) 



(10) 



i=i 



where 



G, m , a„ A) = (1, -1, i)(p^ py'p' m , , A), 

P = (1,A,A2), A = (Ai,... ,A,)^, A2 = (Ai.- - ,A2)^, 



t-Y, 



1 



t-Y, 



,1/2 



\l/2 



' xl/Z^ I \l/2 

The following proposition shows that our SIMEX estimator (10) is an asymp- 
totically unbiased estimator for the unknown density fx- 

Proposition 2.1. Suppose that (a) the polynomial extrapolant is exact; (h) fx 
has a hounded and continuous derivative; and (c) aj < ao for all j and some 
Co < 00. Then the SIMEX estimator fx,siMEx{t) in (10) is a consistent and an 
asymptotically unbiased estimator of density function fx of unobserved sample 
X. The asymptotic variance of the estimator is 



Var{fx,siMEx{t)) = 



.fx{t) 



2'Ka 



:(A)Sac(A)^ 



H 



where gh 



c(A) = (1, -1, 1)(P^P) and T,a is a s x s matrix 



with the Im element equals 



(l,m ~ I, ■ ■ ■ , s). 



The proof of this proposition is given in the Appendix. □ 
Remark 2.1. It can be consistently estimated by replacing fx{i) with fx.siMEx{t) 
in the above formula. The asymptotic variance of fx,siMEx(t) equals the vari- 
ance of a kernel estimator of f{t) with adaptive bandwidth, multiplied by a 
constant which is not dependent on the random sample but the selected values 
of A. This fact enables us to choose an extrapolant that minimizes the asymp- 
totic variance. The asymptotic normality of the SIMEX estimator is easily seen 
from the structure of the proposed SIMEX estimator. The confidence interval 
can be obtained correspondingly. 
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Remark 2.2. The asymptotics of the proposed estimator is based on the as- 
sumption that the polynomial extrapolant is exact. This is a typical assumption 
of SIMEX methods (see for example [2]), although it is difficult to verify it in 
real data analysis. We will show, in the simulation study, it is a reasonable even 
for complex models. 

Remark 2.3. It is possible that the SIMEX density estimate takes negative 
values in small or sparse data regions, especially at tail regions. This behavior is 
common in many classes of kernel methods, such as wavelet density estimator, 
sine kernel estimator, and spline estimator. This disadvantage will not affect the 
global performance of our estimate. A simple correction of our SIMEX estimator 
is 

fx,siMEx{t) = max{/x,s/M£;x(i),0}. 



3. Choice of the smoothing parameter 

Our SIMEX estimator in (8) has the form of variable-bandwidth kernel esti- 
mator [14; 13], where CTj A^^^ plays the role of the variable-bandwidth in the 
estimator. Thus, A can be considered as a smoothing parameter that deter- 
mines the smoothness of the SIMEX estimating function. Our proposed SIMEX 
procedure requires specification of < Ai < A2 < • • • < Ag. The experience from 
our large simulation studies suggests that the choice of s is not critical, neither 
is that of \g if Ai is determined. A typical choice is taking s = 50, A^ = Ai -I- 3 
and A2, • ■ • , As_i are equally spaced points in (Ai, As). We propose two methods 
to select Al here: one minimizes the mean integral squared error (MISE) and 
another is similar to the Silverman's rule-of-thumb method [18]. 

Based on Proposition 2.1, the MISE of the SIMEX density estimator, which 
depends on the parameter A, is 



MISEifx^siMEX, \)^E{j ifx,siMEx{t) - fit)ydt) 
{Bias{jx^siMEx{t))fdt+ ( Var{fx,siMEx{t))dt 



I .r ,t , ^^ , c(A)Sac(A)'^ 

= / yar{!x.siMEx{t))d^-- ' ^ ^ ' 



Since only Ai is critical to the MISE of the SIMEX estimator, we propose to 
choose the parameter Ai and hence the bandwidth of the estimator through the 
minimization of c(A)I]ac(A)^. 

Another method to select the bandwidth is a type of rule-of-thumb method. 
For example, the bandwidth of a kernel density estimator is 



Ry 

T-Y^rot = ao mm <; cry, r" 



-1/5 



where ay is the estimate of observed data Y, Ry = ^[0.75™] — ^[o.25ra] is the 
inter-quartile range. Silverman's rule of thumb [18] uses factor ao = 0.9, while 
Scott [17] suggested using oq = 1.06. In the rest of the paper we use ao = 1.06. 
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From the equation (8), crj^^ plays the role of a variable-bandwidth in our 
estimator, based on the measurement error-inflated pseudo data sets Y{Xi). In 
order to determinate Xi_rot we set: 

hY{\)~'^uXy^ =CohY 

where we choose cq = \/ cry + ^ij/cty and ajj is the average of heteroscedastic 
standard deviation of measurement errors, i.e. au = J2]=i "'i/"-- 

4. Simulation study 

In this section, the finite sample performance of the proposed SIMEX estimate 
is investigated via a simulation study. Our study involves the following three 
densities, representing typical features that can be encountered in practice: 

(1) X ~ A^(0, 1), a standard normal distribution. 

(2) X ~ Gamma{2, 1), a Gamma distribution that is right skewed. 

(3) X ~ 0.5A^(-2, 1) + 0.5iV(2, 1), a normal mixture that is bimodal. 

To assess the quality of a density estimator, we consider its integrated squared 
error (ISE) to the true density /: 

ISE{f) = j{f-f{x)Ydx. 

The means and their standard errors of the ISEs of our SIMEX estimate, Fourier- 
type DKE, the naive estimate as well as the estimate obtained by using uncon- 
taminated X are compared below under the three typical distributions above 
for samples of sizes 50, 100, 250, and 1000, respectively. Figures of estimated 
curves below provide detailed pictures of the estimates in the entire domain of 
X. We have carried out more extensive simulation experiments by considering 
more complex target densities and other selections of the error variances than we 
can present here. Fortunately, all simulation experiments showed similar con- 
clusions about the performance of the SIMEX estimator. All algorithms and 
simulations were implemented in R/Splus. Full results of the simulation study 
and R functions can be obtained from the authors upon request. 

4.1- The case of homos cedastic errors 

From each case of the densities, 1000 samples of size n ~ 50, 100, 250, and 
1000 were generated, each of which was then contaminated by a sample from 
a normal density of N{0,a1j). For each configuration, the parameter ajj was 
chosen equal to 0.2. 0.4, 0.6, 0.8, respectively. 
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Choice of bandwidth. From the studies of DKE, it is known that the kernel 
should be selected among densities whose characteristic function has a compact 
and symmetric support [11; 26; 5]. The use of such kernels guarantees the ex- 
istence of the density estimator of (4). The most common example of such a 
kernel is given by the second-order kernel characteristic function 

where I[_i i^(t) is the indicator hmction. The corresponding kernel function is 

, 48cosa; / 15\ 144sina; / 5 \ 
K{x) = — 1 - — 2 - - . 

TTX \ X / TTX° \ X / 

This was the kernel we used in our simulation. 

Choice of bandwidth. Since, Delaigle and Gijbels [6] showed that an appropri- 
ately chosen plug-in asymptotical bandwidth selector performed better than a 
cross-validated bandwidth selector and other bandwidth selectors, we therefore 
used the plug-in asymptotical bandwidth 

following [11; 6], where cq = 1.05 in our simulation. 

Results are shown in Table 1, in which entries without parentheses are the 
means and entries with parentheses are the standard errors of ISEs from our 
simulations of size 1000. The SIMEX estimate performs uniformly better than 
the DKE in terms of the ISE criterion. When the sample size increases, the 
means of ISEs of both the SIMEX estimate and the DKE get closer to the means 
of ISEs of kernel estimate of uncontaminated sample. The SIMEX estimate 
works beautifully for the cases of the moderate sample size and the large error 
variance. The standard errors of both the SIMEX estimate and the DKE are in 
the reasonable range. 

[Table 1 about here.] 

Figure 1 shows an example of deconvolution density estimation for the case 
of the homoscedastic measurement errors. X is generated from A^(0, 1) and two 
levels of errors au = 0.2 and au = 0.8 and four levels of sample size, n = 50, 100, 
250, and 1000 are considered in our study. In the figure, solid line denotes kernel 
estimate by uncontaminated sample X; dashed line denotes estimate by SIMEX 
method; dotted line denotes estimate by DKE method. Both the SIMEX and 
DKE methods recover the modes and capture the shape of true densities for the 
large sample sizes accurately. However, the DKE method is not stable when the 
sample size become smaller. With the small error variance (the sub-plot (a)), 
the DKE method shows wiggiy curves for the small and modest sample sizes. 
This is due to the selection of the support kernel. A similar situation was also 
noted by [11]. A careful selection of the kernel function of DKE may improve 
the results. With the cases of the large error variance and/or small sample sizes, 
the DKE tends to underestimate the peaks while SIMEX works better. 

[Figure 1 about here.] 
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4-2. The case of heteroscedastic errors 

In the case of heteroscedastic errors, measurement errors were generated from 
-/V(0, cr|), where aj {j = 1, ■ ■ ■ ,n) were generated from a uniform distribution, 
U{a,b). (a, 5) was chosen to be (0.2,0.4), (0.4,0.6), (0.6,0.8), (0.8,1), respec- 
tively. Due to the heavy computational burden of the Fourier-type method for 
the case of heteroscedastic errors, we considered a simulation study of size 500 
rather than 1000 for each of the cases under sample sizes n~ 50, 100, 250, and 
1000, respectively. It took about several hours to finish a single case study using 
the Delaigle and Mcister's Fourier-type method while our SIMEX procedure 
only took a few minutes to finish it. Of course, more efficient algorithm may 
be developed to speed up the computation of a Fourier-type estimate, but that 
was not our objective. Our simulation study of size 500 was already informative 
in comparing the performance of the SIMEX procedure and the Fourier-type 
procedure as shown in Table 2 and Figures 2 and 3. 

The Fourier-type estimate we compared with was Delaigle and Meister's ad- 
justed Fourier estimate for the density estimation with heteroscedastic error [8]. 
Their adjust estimator can be written as a form of a kernel density estimator, 

where 

2tt J ipuj [t/h) (puj (~t) 

Table 2 shows the means and the standard errors of ISEs of the SIMEX 
estimate, Delaigle and Meister's Fouricr-typc estimate, the naive estimate and 
the standard kernel estimate based on uncontaminated data from 500 simulated 
samples for different cases. Similarly to the case of homoscedastic errors, the 
simulation results show that the SIMEX method performs uniformly better 
than the adjust DKE method in terms of the ISE. Comparing with Table 1, we 
find that the means and standard errors in the case of heteroscedastic errors are 
slightly larger than those in the case of homoscedastic errors. The results are 
not unexpected because heteroscedasticity of measurement errors brings more 
uncertainty and variation in the estimation than the homoscedastic errors. 

[Table 2 about here.] 

Figure 2 and Figure 3 display two examples of dcconvolution density esti- 
mation in the case of the heteroscedastic errors. Two types of distributions are 
considered: a Gamma{2, 1) distribution, and a normal mixture 0.5-/V(— 2, 1) -|- 
0.57V(2,1). The errors are from NiO^a]) where (a) aj ~ [7(0.2,0.4), or (b) 
(7j ~ C/(0.8, 1). The SIMEX estimates are closer than the true densities than 
the adjust DKE estimates do, especially at peaks and valleys. Despite the com- 
plex models, we see that the SIMEX estimate performs quite well in recovering 
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the true densities and they are even competitive to the kernel estimate that are 
based on uncontaminated data. 

[Figure 2 about here.] 

[Figure 3 about here.] 

5. A real data application 

Our research was motivated with a real data analysis for astronomical data 
measured with errors. It is known that most astronomical data come with in- 
formation subject to measurement errors. Sun et al. [24] gave an excellent in- 
troduction on the motivation of the measurement error problems in astronomy. 
Studying the distribution of one-dimensional velocities of stars originating in a 
given galaxy is of interest in astronomy. In this section, we illustrate our pro- 
posed method with an application to the astronomical position-velocity data 
in [16] from a sample of 26 low surfaces brightness (LSB) galaxies. The data 
contain 510 observed velocities of stars in km/s (relative to center, corrected for 
inclination) from 26 LSB galaxies. Each of observations includes its estimated 
standard deviation of measurement error. The sub-plot (a) of Figure 4 displays 
the histogram of measurement error standard deviations aj. The standard de- 
viations vary from 0.1 ^ 46.8 km/s and the mean is 6.34. The distribution is 
obviously skewed. 

If we ignore the measurement errors, the velocity data look quite normal. 
The data range from —289.00 to 300.20 and the mean is —1.41 and the median 
is —1.00. We applied the SIMEX method and the adjust DKE method to the 
data. The resulting estimated densities are shown in the sub-plot (b) of Figure 
4. The two corrected estimates are consistent to each other, but not to the naive 
estimate. The probability around zero is higher and a small bump is detected 
on the left side of the curves (where the velocity approximately equal to —250 
km/s) by both corrected estimates. This cannot be clearly seen from the naive 
estimate. Astronomers are mostly interested in this substructure of galaxies and 
can conduct further studies, which could lead to new discoveries. 

[Figure 4 about here.] 

6. Discussion 

We presented a fast algorithm using SIMEX method to recover the unknown 
density when the data are contaminated with heteroscedastic errors and com- 
pared it with the Fourier-type method. The SIMEX estimate has advantages 
over the Fourier-type method in terms of ISE and computational efficiency and 
burden. We did not directly compare the SIMEX method with Bayesian method 
by Staudenmayer et al. [20], because they discussed the their method under a dif- 
ferent model setting where the observed data are the sample means of replicates 
contaminated with heteroscedastic errors. We note that the Bayesian method 
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has the similar simulation performance as the adjust DKE method in terms of 
ISE criterion under the model model setting of [20] (See table 1 of their paper), 
and it is also computational intensive. 

Although the Fourier method has many nice theoretical properties, the SIMEX 
method is an excellent alternative in real data analysis. It is easy to implement 
and computationally more efficient. For example. The SIMEX method can also 
be used in classification analysis of microarray data with measurement error. 
In such a case, applying Fourier-type method is slow because we are facing 
deconvolution problems with thousands of genes. 



Appendix 

Proof of Proposition 2.1: 
By (8), 



1 " 1 

.7 = 1 ■' 



1 ^ 1 

71 2^ n- - Al/ 



= ^YlJJ fxit- (TjU- crjX^^^v)(j>iv)(f)iu)dvdu (12) 
i=i 

where Z ^ N{0, 1). Hence under conditions (b) and (c), 

hm E{g*it, A)) = lim -J^E \fx{t - <y,{l + X^/'U)] = fx(t). 

A — *■ — 1 A — > — 1 72 L 

So, by (9), fx,siMEx(t) = /3o - /3i + /32 ^ /3o - /3i + /32 = fx{t). Under the 
condition (a), fx,siMEx(t) is thus asymptotically unbiased and consistent. 

The asymptotic variance can be calculated from the extrapolation step. In- 
deed, g*{t,X) can be treated as a kernel estimator of g{t) with adaptive band- 
width (7jA^/^ as shown in (8). Hence the asymptotic variance of this estimator 
is 

n 

Var{g*{t, A)) = - J] ——E{cf,{V)fx(t - <7,{U + X'^'V))) 



-Y^\E{fx{t-a,{l + XY'^V)) 
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by tracing the same arguments as those in (12), where U and V arc independent 
standard normal random variables. When the measurement error variances are 
small, the above variance can be approximated by 



Var{g*it)) = 



where (fu — ^ X]j=i ^^e average of the standard deviations, and ajj = 
n/jy^^i ^ is the harmonic average of the standard deviations. The harmonic 
average, from its definition, characterizes the underlying data in a way similar 
to their minimum. The approximation error of the above calculation is 0(— ). 
The covariance between g*(t, A), when A takes different values A; and A^, is 



cov{g*{t,Xi),g*{t,X^)) 



1 " 



1 



2Trajy/Xi + A„ 



zE 



A/ A 



/Am 



A( + A, 



xl/2 



W) 



^E [fx{t - (7,(1 + AO'/V)] E [fx{t - (7,(1 + X,n)^/^V) 



fx{t) 



1 



O(^) 
n 



Let c(A) ^ (l,-l,l)(P^P)~ip^. Therefore, by (10), the asymptotic vari- 
ance of our SIMEX density estimator is 



l/ar(/ 



X,SIMEX 



" i=i 
fx{t) 



ny/2'KaH 



c(A)I]ac(A)^ 



where Ea be a s x s matrix with the Im}^ element equals , 
the proof. 



We complete 
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Table 1 

The means and standard errors of ISEs for simulation in the case of homoscedastic error. 
Simulation size is 1000. Entries without parentheses are the means and entries with 
parentheses are the standard errors of ISEs. 



Density 


n 








ISE 












SIMEX 


fx 


fy 




oliViiljJs. 


fx 


fy 


DKE 


Normal 








=0.2 








=0.4 






50 


0.01040 


0.01085 


0.01147 




n ni fi7n 

U.UIU 1 u 


0.01114 


0.01333 


0.01160 






(0.00030) 


(0.00026) 


(0.00027) 


( r\ c\r\c\Ar\\ 
(U.UUU4U j 


/ r\ r\r\r\A a \ 
(U.UUU44j 


(0.00026) 


(0.00027) 


(0.00027) 




100 


0.00604 


0.00647 


0.00686 


U.UiOO± 


n nnQ7s 
u.uuy 1 o 


0.00646 


0.00885 


0.00784 






(0.00017) 


(0.00015) 


(0.00016) 


(U.UUUz4) 


I r\ n n n o 'T \ 

(U.UUUzdj 


(0.00013) 


(0.00017) 


(0.00018) 




250 


0.00300 


0.00327 


0.00361 


U.UUD 1 / 


U.UU'4:00 


0.00308 


0.00520 


0.00442 






(0.00007) 


(0.00006) 


(0.00007) 


/ r\ n/~in i t ^ 




(0.00006) 


(0.00009) 


(0.00009) 




1000 


0.00103 


0.00108 


0.00135 


n nn9'^zi 


n nni 79 
u.uui 1 z 


0.00111 


0.00289 


0.00215 






(0.00002) 


(0.00002) 


(0.00002) 
=0.6 


i r\ c\r\(\r\o\ 
(U.UUUUo j 


/ r\ r\r\r\r\ a \ 

(U.UUUU4j 


(0.00002) 


(0.00004) 
=0.8 


(0.00004) 




50 


0.02159 


0.01148 


0.01776 


U.UiOoO 


U.UZ'4:00 


0.01151 


0.02446 


0.02902 






(0.00059) 


(0.00028) 


(0.00034) 


(U.UUUoU ) 


(U.UUUoo ) 


(0.00026) 


(0.00042) 


(0.00033) 




100 


0.01230 


0.00637 


0.01312 




n ni 499 

U. U l'4:ZZ 


0.00634 


0.02033 


0.02363 






(0.00030) 


(0.00014) 


(0.00024) 


(U.UUU22j 




(0.00014) 


(0.00029) 


(0.00024) 




250 


0.00600 


0.00314 


0.00938 


U.UUOO / 


U.UUooO 


0.00329 


0.01711 


0.01817 






(0.00013) 


(0.00006) 


(0.00013) 


(U.UUU12) 




(0.00006) 


(0.00020) 


(0.00017) 




1000 


0.00236 


0.00112 


0.00703 




u.uuoyo 


0.00113 


0.01423 


0.01270 






(0.00005) 


(0.00002) 


(0.00006) 


(U.UUUUu ) 


(U.UUUUTj 


(0.00002) 


(0.00009) 


(0.00008) 


Gamma 








=0.2 








=0.4 






50 


0.01222 


0.01187 


0.01250 


U.Uzzyo 


n ni f^9fi 

U.UiOZD 


0.01157 


0.01469 


0.01357 






(0.00030) 


(0.00027) 


(0.00027) 


( r\ c\r\c\QQ\ 
(U.UUUoo ) 


(U.UUUo ( ) 


(0.00025) 


(0.00028) 


(0.00028) 




100 


0.00738 


0.00716 


0.00793 




n noQi 7 
u.uuyi t 


0.00698 


0.01019 


0.00906 






(0.00015) 


(0.00014) 


(0.00015) 


[U.UUUzi j 




(0.00013) 


(0.00017) 


(0.00016) 




250 


0.00420 


0.00406 


0.00477 




U.UUOiJ -L 


0.00421 


0.00737 


0.00614 






(0.00007) 


(0.00007) 


(0.00007) 


1 r\ nr\r\ 1 r\\ 

(U.UUUlU) 


/ n nnn i 1 \ 


(0.00008) 


(0.00010) 


(0.00010) 




1000 


0.00181 


0.00164 


0.00228 


U.UUZrUtJ 


U.UUZOO 


0.00164 


0.00451 


0.00334 






(0.00002) 


(0.00002) 


(0.00003) 
=0.6 


(U.UUUUo ) 


/ n r\r\r\r\ a \ 

(U.UUUU4j 


(0.00002) 


(0.00004) 
=0.8 


(0.00004) 




50 


0.01977 


0.01143 


0.01834 


n ni RVR 


n noQi Q 


0.01195 


0.02350 


0.02512 






(0.00053) 


(0.00025) 


(0.00033) 




/ n nnn cr cr \ 

(U.UUUooj 


(0.00027) 


(0.00038) 


(0.00031) 




100 


0.01234 


0.00713 


0.01426 


n ni '^1 


n ni f^SQ 


0.00741 


0.02006 


0.02112 






(0.00028) 


(0.00014) 


(0.00022) 


(U.UUUzU) 


/ n nnn o \ 

(U.UUUooj 


(0.00014) 


(0.00026) 


(0.00022) 




250 


0.00756 


0.00406 


0.01124 


n ni nn)R 


n ni n'^Q 
u.uiuoy 


0.00400 


0.01681 


0.01689 






(0.00013) 


(0.00007) 


(0.00013) 




/ n nnn 1 '7\ 

(U.UUUlYj 


(0.00007) 


(0.00016) 


(0.00014) 




1000 


0.00409 


0.00164 


0.00853 


U.UU ( uo 


n nnfi7n 

U.UUU 1 u 


0.00166 


0.01424 


0.01298 






(0.00005) 


(0.00002) 


(0.00006) 


(0.00005) 


/n nnrmn\ 

(0.00009) 


(0.00003) 


(0.00008) 


(0.00007) 


Mixture 








=0.2 








=0.4 






50 


0.01448 


0.01001 


0.01058 


0.02146 


0.00864 


0.01021 


0.01224 


0.01005 






(0.00011) 


(0.00012) 


(0.00012) 


(0.00029) 


(0.00015) 


(0.00012) 


(0.00013) 


(0.00018) 




100 


0.00979 


0.00715 


0.00771 


0.01299 


0.00551 


0.00727 


0.00954 


0.00675 






(0.00008) 


(0.00008) 


(0.00009) 


(0.00016) 


(0.00010) 


(0.00008) 


(0.00010) 


(0.00012) 




250 


0.00472 


0.00418 


0.00473 


0.00659 


0.00264 


0.00423 


0.00643 


0.00375 






(0.00005) 


(0.00005) 


(0.00005) 


(0.00007) 


(0.00005) 


(0.00005) 


(0.00006) 


(0.00006) 




1000 


0.00115 


0.00174 


0.00216 


0.00234 


0.00088 


0.00173 


0.00357 


0.00162 






(0.00002) 


(0.00002) 


(0.00002) 
=0.6 


(0.00002) 


(0.00002) 


(0.00002) 


(0.00003) 
=0.8 


(0.00002) 




50 


0.01020 


0.01033 


0.01508 


0.01103 


0.01390 


0.01033 


0.01834 


0.01664 






(0.00020) 


(0.00012) 


(0.00015) 


(0.00019) 


(0.00026) 


(0.00012) 


(0.00016) 


(0.00017) 




100 


0.00651 


0.00717 


0.01218 


0.00794 


0.01005 


0.00748 


0.01609 


0.01378 






(0.00013) 


(0.00008) 


(0.00011) 


(0.00013) 


(0.00019) 


(0.00008) 


(0.00012) 


(0.00014) 




250 


0.00368 


0.00428 


0.00921 


0.00541 


0.00604 


0.00431 


0.01299 


0.01029 






(0.00007) 


(0.00005) 


(0.00007) 


(0.00008) 


(0.00011) 


(0.00005) 


(0.00009) 


(0.00010) 




1000 


0.00153 


0.00172 


0.00622 


0.00324 


0.00325 


0.00175 


0.01001 


0.00713 






(0.00003) 


(0.00002) 


(0.00004) 


(0.00003) 


(0.00005) 


(0.00002) 


(0.00005) 


(0.00005) 
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Table 2 

The means and standard errors of ISEs for simulation in the case of heteroscedastic error. 
Simulation size is 500. Entries without parentheses are the means and entries with 
parentheses are the standard errors of ISEs. 



Density 


n 








ISE 












SIMEX 


fx 


fy 




CTTV/TTV 


fx 


fy 


DKE 


Normal 






au ~ 1/(0.2,0.4) 






au ~ (7(0.4,0.6) 






50 


0.01496 


U.Ui 1 iU 


n ni 9A9 
U.UiZ^Z 


n ni 499 

U.Ui4tZZ 


n ni SQ9 

u.uioyz 


0.01093 


0.01506 


0.01275 






(0.00064) 


(U.UUUoo) 


/ n /"inn /i c: \ 

tU.UUU4oj 


(U.UUU4/ j 


( r\ nnn^;n\ 

(u.uuuuy ) 


(0.00037) 


(0.00045) 


(0.00042) 




100 


0.00846 


U.UUDZ i 


n nn7'^fi 

U.UU ( OD 


U.UUOO'4: 


n m 1 7*? 

U.UI i / 


0.00674 


0.01076 


0.00911 






(0.00027) 


(U.UUUlo) 


n nnnoo\ 

U.UUUzz) 


/ r\ nnn n cr \ 

(U.UUUzoj 


/n n n m n \ 

(U.UUUo9) 


(0.00021) 


(0.00029) 


(0.00028) 




250 


0.00429 


U.UUoZo 


U.UU^Zo 


n nnzi'^9 


u.uuouo 


0.00320 


0.00729 


0.00597 






(0.00013) 


(U.UUUU9J 


/ n n n n 1 -i \ 

tU.UUUii) 


1 r\ nnn 1 o^ 

(U.UUUlzj 


/ n nnn 1 'v\ 

(U.UOUlYj 


(0.00009) 


(0.00017) 


(0.00016) 




1000 


0.00153 


U.UUl i 1 


n nni so 
u.uuioy 


n nni 7n 

U.UUi 1 u 


n nni q7 
u.uuiy 1 


0.00113 


0.00432 


0.00306 






(0.00005) 




rn nnnn'^^ 

U.UUUUO J 


/n nnnnc;\ 
(U.UUUUo ) 


/n nnnn'7\ 

(U.UUUU7) 


(0.00004) 


(0.00009) 


(0.00008) 








„ T T ( r\ 


6, 0.8) 






au ~ U{0.8, 1) 






50 


0.02318 


U.UiUOO 


n n9n'^Q 
u.uzuoy 


n n9i Q1 
u.uziy i 


n n9fiS7 


0.01150 


0.02995 


0.03816 






(0.00101) 


(U.UUUo4) 


n nnn cr o\ 
U.UUU08) 


/n r\f\c\ACi\ 
(U.UUU40 ) 


/n nnnnn\ 
(U.UUU9UJ 


(0.00040) 


(0.00066) 


(0.00043) 




100 


0.01394 


U.UUDDD 


n ni f^7n 

U.UiO 1 u 


n ni 7'\^ 
U.Ui / 00 


n ni fifin 

U.UiUUU 


0.00629 


0.02553 


0.03167 






(0.00043) 


/ n nnno i ^ 

(U.UUUzi j 


U.UUUooj 


[U.U0Uo4j 


/ n nnn /I \ 

(U.U0Uo4j 


(0.00018) 


(0.00048) 


(0.00036) 




250 


0.00685 


U.UUoZ 1 


n ni 9fiQ 
u.uizoy 


n ni 9Afi 
u.uiz^yru 


n nnQ77 
u.uuy ( 1 


0.00315 


0.02153 


0.02416 






(0.00020) 


( Ci r\r\r\CiC\\ 

(U.UUUU9) 


n n n n \ 


/n nnnoi \ 
(U.UUUzl ) 


( r\ nnnoQ\ 
(U.UUUZo ) 


(0.00009) 


(0.00015) 


(0.00013) 




1000 


0.00285 


n nni no 


n ni nns 

U.UiUUo 


U.UUO'4:Z 


n nn'^'^9 
u.uuooz 


0.00113 


0.01885 


0.01774 






(0.00009) 


(a nnnn'?^ 


(0 nnni 

U.UUUiO ) 


/n nnni o\ 
(U.UUUiZ j 


/ n nnn 1 r* \ 

(U.UOUlbj 


(0.00004) 


(0.00019) 


(0.00016) 


Gamma 






crjj ^ U{0 


2, 0.4) 






au ~ U{0 


4,0.6) 






50 


0.01301 


n ni 1 9S 


n ni "^1 1 

U.UiOi i 


n ni 1x71 
U.UiO 1 i 


n ni 79^ 

U.Ui 1 zo 


0.01113 


0.01593 


0.01397 






(0.00045) 


(U.UUUo4) 


/ n nnno o\ 

U.OUOooj 


/ n nnn /i ^J^ 

[U.U0U42j 


/ n nnn rr n \ 

(U.U0Uo9j 


(0.00031) 


(0.00042) 


(0.00039) 




100 


0.00861 


U.UU ( 00 


n nnQA9 
u.uuy'^rZ 


n nnoQn 
u.uuyyu 


n ni n79 

U.UiU / z 


0.00733 


0.01247 


0.01080 






(0.00025) 


/ n nnno i ^ 
(U.UUUzi j 


U.UUUz4j 


1 r\ nnnofi\ 
(U.UUUzo j 


/ n nnno 1 \ 

(U.UUUol) 


(0.00021) 


(0.00027) 


(0.00024) 




250 


0.00436 


n nn'^Qi 

u.uuoy 1 


u.uuooy 


n nn'^91 

U.UUOZi 


n nnfi9s 

U.UUUZO 


0.00386 


0.00866 


0.00723 






(0.00011) 


(U.UUUiU) 


/ n nnn 1 o\ 

(U.OUOioj 


/ n nnn 1 

(U.U0U12) 


/ n nnn 1 r* \ 

(U.UUUlbj 


(0.00010) 


(0.00016) 


(0.00015) 




1000 


0.00197 


n nni fi9 

U.UUIDZ 


n nn'^1 1 

U.UUoi i 


n nn9'^n 
u.uuzou 


n nn'^9n 
u.uuozu 


0.00165 


0.00618 


0.00474 






(0.00004) 


(0 nnnn'^^ 

\^U.UUUUO ) 


U.UUUUO ^ 


/ n nnnn /i \ 
[U.UUUU4j 


/n nnnn/^\ 

(U.UUUUb) 


(0.00003) 


(0.00008) 


(0.00007) 










6, 0.8) 






au ~ f/(0.8, 1) 






50 


0.02210 


n ni 1 79 

U.Ui ± 1 z 


fl 091 fi'? 
U.UZiDo 


n noi Qi^i 
U.UZioD 


n n9A7i 
u.uz^y: i i 


0.01217 


0.02686 


0.03016 






(0.00077) 


(U.UUUo?) 


n nnn cr 1 \ 

U.UUUol j 


1 r\ nnn /1 N 

(U.UUU44j 


(U.UUUoU) 


(0.00037) 


(0.00055) 


(0.00043) 




100 


0.01403 


U.UU ( oo 


n ni fi^^i 

U.UiUOO 


n ni fi9i 

U. U iUZ i 


n ni 79*^ 

U.Ui 1 zo 


0.00739 


0.02318 


0.02578 






(0.00041) 




/ n nnno o\ 

tU.OUUooj 


1 r\ nnnon^ 

(0.U0U29} 


/ n nnn /I 'v\ 

(U.UUU47j 


(0.00020) 


(0.00040) 


(0.00032) 




250 


0.00896 


u.uuo 1 y 


n ni "^Qi 
u.uioy i 


n ni 

U.UiOZO 


n ni 1 Qfi 
U.Ui iyu 


0.00419 


0.02018 


0.02122 






(0.00023) 


(0.00009) 


n nnno 1 \ 

0.00021 ) 


/n nnn 1 n^ 

(U.UUU19) 


/n n n n n \ 

(U.UUUoU) 


(0.00011) 


(0.00026) 


(0.00021) 




1000 


0.00518 


0.00161 


0.01098 


n nnQ'^7 
u.uuyo ( 


n nn7Qfi 
U.UU 1 yu 


0.00163 


0.01735 


0.01634 






(0.00010) 


(0.00003) 


(0.00011) 


1 r\ r\r\r\r\r\\ 

(0.00009) 


/ n nnn 1 /i \ 
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b: Error distribution is N{0,O.S'^), 

Fig 1. Deconvolution density estimation in the case of heteroscedastic error: the true 
density is N{0, 1) and measurement errors are from (a) N{0,0.2^) and (h) A'^(0, 0.8^) 
with different sample sizes. For both sub-plots (a) and (b), n = 50 (top left panel), 
n = 100 (top right panel), n — 250 (bottom left panel) and n — 1000 (bottom right 
panel). Solid line - kernel estimate by uncontaminated sample X; dashed line - estimate 
by SIMEX method; dotted line - estimate by DKE method. 
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b; Error distribution is N(0,cr'^) and crj U{0.8, 1). 

Fig 2. Deconvolution density estimation in the case of heteroscedastic error: the 
true density is Gamma{2,l) and measurement errors are from (a) N{0,a^), aj ~ 
[7(0.2,0.4) and (b) N{0,a^), aj ~ [7(0.8, 1) with different sample sizes. For both sub- 
plots (a) and (b), n = 50 (top left panel), n — 100 (top right panel), n = 250 (bottom 
left panel) and n = 1000 (bottom right panel). Solid line - kernel estimate by uncon- 
taminated sample X; dashed line - estimate by SIMEX method; dotted line - estimate 
by adjusted DKE method. 
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b; Error distribution is N{0,cr'^) and crj ~ ^(0.8, 1). 

Fig 3. DeconvoluUon density estimation in the case of heteroscedastic error: the true 
density is 0.5N{ — 2, I) + 0.5A'^(2, 1) and measurement errors are from (a) N{0,aj), 
aj ~ f/(0.2, 0.4) and (b) A^(0,(t|), aj ~ ?7(0.8, 1) with different sample sizes. For both 
sub-plots (a) and (b), n — 50 (top left panel), n — 100 (top right panel), n — 250 
(bottom left panel) and n — 1000 (bottom right panel). Solid line - kernel estimate 
by uncontaminated sample X; dashed line ~ estimate by SIMEX method; dotted line - 
estimate by adjusted DKE method. 
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measurement error 
a: Histogram of measurement errors. 
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velocity 

b; Density estimation. 

Fig 4. Density estimation of velocities in astronomical position-velocity data. The solid 
line IS the naive estimate ignoring the heteroscedastic measurement errors. Two cor- 
rected estimates are considered here: SIMEX method (dashed line), DKE method (dot- 
ted line). 



